function, such as
ߝൌ1
ܰሺݕො
ே
െݕሻଶ
ୀଵ
(3.33)
ting all model parameters using a vector notation w and a data
ܠ, the estimation of w is an optimisation process
ܟෝൌmin
ୟ୰൝1
ܰሺ݂ሺܠ, ܟෝሻെݕሻଶ
ே
ୀଵ
ൡ
(3.34)
use this ݂ሺܠ, ܟෝሻ is a nonlinear function, there will not be an
format of the estimated model parameters such as the one used in
odels. Therefore, an algorithm called the backward propagation
m has been developed to estimate parameters for a MLP model
art, et al., 1986]. The backward propagation algorithm is a variant
ewton’s method [Fletcher, 1987]. It is based on the derivative
of an error function, which is defined in an equation shown
here 0 ൏ߟ൏1 is called the learning rate,
Δܟൌെߟߝ
ܟ
(3.35)
se of the above equation means that the change (update) of every
el parameter is negatively proportional to the derivative function
rror function, ܟ௧ାଵൌܟ௧െߟߘߝ௧ߘܟ௧.
⁄
A great derivative
points to a steep or sharp segment curve of the error function.
e, a cautious update of the model parameters is required. This
move can avoid missing the optimal point (jumping over the
oint) of the error function curve. A small derivative function
nds to a more flattened curve segment of the error function.
e, a greedy or greater update can happen to the model parameters.
ver, this simple update may cause oscillation, i.e. the over-update
p over the saddle point of the error function curve forward and